Model Selection

ImageNet-21k pre-trained

# ImageNet-21k pre-trained

Vit Huge Patch14 224.orig In21k

Large-scale image feature extraction model based on Vision Transformer (ViT) architecture, pre-trained on ImageNet-21k dataset

Image Classification

Swinv2 Large Patch4 Window12 192 22k

Swin Transformer v2 is a vision Transformer model that achieves efficient image classification and dense recognition tasks through hierarchical feature maps and local window self-attention mechanisms.

Image Classification

Vit Large Patch32 224 In21k

This Vision Transformer (ViT) model is pre-trained on the ImageNet-21k dataset and is suitable for image classification tasks.

Image Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase